From Episodes to Sagas: Understanding the News by Identifying Temporally Related Story Sequences

نویسندگان

  • Ramnath Balasubramanyan
  • Frank Lin
  • William W. Cohen
  • Matthew Hurst
  • Noah A. Smith
چکیده

News interfaces are largely driven by recent information, even if many events are better interpreted in context of previous events. To address this problem, we consider the task of constructing an explicit representation of a “saga”—a longrunning series of related events. We define a timeline as a concrete representation of a “saga” and we propose two unsupervised methods for timeline construction and compare their performance to hand-produced timelines using a tree edit distance measure. Preliminary results using these techniques on a weblog corpus and a supplementary news corpus are presented, showing both promise and challenges. Introduction: Why Timelines Are Useful One limitation of most current news interfaces is that they are largely driven by recent information: most of the user’s attention is directed toward events of the last few hours or even minutes. This leads to a view of current events which is broad, but shallow, and many events are better interpreted in context of previous related events. The hypothesis behind our work is that it is useful to construct representations of such “sagas”—i.e., long-running sequences of related events. From an application perspective, we are interested in providing tools giving a reader the complete narrative context of a given event. From a sociological perspective, we are interested in finding out how events are perceived, reported and synthesized in a number of media types including news and weblogs. To more precisely ground the problem we will propose a simple model of events and their relationships. An event, represented by a node in an event graph, has a time and duration, and is described in some appropriate manner. The edges in an event graph encode binary relations between events. These relations could be simple temporal relationships but could also capture causality or other deeper relationships. While the general notion of an event graph is useful, in this paper we focus on two special cases. One special case is a simple timeline—i.e., a linear sequence of events. In this work we will mine timelines from weblogs. One potential advantage of using social media is that it provides information about the relative importance of events, as Copyright c © 2009, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. perceived by members of a social community, thus provided a more normative view of what events should be in a timeline; likewise, social media provides information about the appropriate granularity of events. Challenges in Constructing Timelines Our long-term goal is to automatically construct a cohesive narrative for a “saga” that is easily accessible to the user and that facilitates in-depth study of news on any topic. This is a difficult task, because understanding which past stories give the best context for an event is difficult, requiring many subtle judgments about relevance, entity identity, and so on. We will begin by proposing a precise notion of a timeline. A timeline TL is a sequence of event nodes n1, . . . , nk, each of which corresponds to an event ei, by which we mean, informally, something that happened in the “real world.” Each event node ni has an associated time span ti, indicating the duration of the associated event, and a textual description qi. Given a particular corpus C of documents, an event node ni can also be associated with a binary classifier r C i , which labels each document in d ∈ C with an indicator as to whether or not it is relevant to event ei. We will call r C i an event classifier. A common way of summarizing a timeline on the web is to provide, for each event node ni, the time span, a description, and a small sample of relevant documents. In summary, then, for this paper we will define a complete timeline T (C) over a corpus C as a set of event nodes n1, . . . , nk, each which has the following properties: 1) a short textual description qi; 2) an associated real-world event ei; 3) a time span ti indicating the duration of ei, where ti is further defined by a start and end time; 4) an indication of which documents d ∈ C are relevant to ni, represented as a function r i (d), where ri(d) = 1 iff d is relevant to ni; and 5) a sample S i of highly-relevant documents from C. We use the term timeline completion for the task of completing a partially-specified timeline. Each kind of incompletely-specified timeline leads to a slightly different technical different problem, many of which can be mapped to well-studied tasks in learning, natural language processing, and information retrieval. If only C is given, then finding r i is an unsupervised clustering problem, which we will focus on in the remainder of this paper. 179 Proceedings of the Third International ICWSM Conference (2009)

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Allegory: Structure, Interpretation and Polysemy

In allegories polysemy relates not only to the context and the audience’s understanding but also to the structural characters of these texts. This paper investigates the function of structural and narrative properties in the creation of multiple interpretations of an allegory. Focusing on the events and following a unique story-line is the most important trait in helping to read the alleg...

متن کامل

LEARNER INITIATIVES ACROSS QUESTION-ANSWER SEQUENCES: A CONVERSATION ANALYTIC ACCOUNT OF LANGUAGE CLASSROOM DISCOURSE

This paper investigates learner-initiated responses to English language teachers’ referential questions and learner initiatives after teachers’ feedback moves in meaning-focused question-answer sequences to analyze how interactional practices of language teachers, their initiation and feedback moves, facilitate learner initiatives. Classroom discourse research has largely neglected learner init...

متن کامل

A Conversation Analytic Study on the Teachers’ Management of Understanding-Check Question Sequences in EFL Classrooms

Teacher questions are claimed to be constitutive of classroom interaction because of their crucial role both in the construction of knowledge and the organization of classroom proceedings (Dalton Puffer, 2007). Most of previous research on teachers’ questions mainly focused on identifying and discovering different question types believed to be helpful in creating the opportunities for learners’...

متن کامل

Making Sense of Issues Through Media Frames: Understanding the Kosovo Crisis

How do people make sense of politics? Integrating empirical results in communication studies on framing with models of comprehension in cognitive psychology, we argue that people understand complicated event sequences by organizing information in a manner that conforms to the structure of a good story. To test this claim, we carried out a pair of experiments. In each, we presented people with n...

متن کامل

Unsupervised topic discovery applied to segmentation of news transcriptions

Audio transcriptions from Automatic Speech Recognition systems are a continuous stream of words that are difficult to read. Segmenting these transcriptions into thematically distinct stories and categorizing the stories by topics increases readability and comprehensibility. However, manually defined topic categories are rarely available, and the cost of annotating a large corpus with thousands ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009